speech tagging bilingual speech transcript
Part of Speech Tagging Bilingual Speech Transcripts with Intrasentential Model Switching
Rodrigues, Paul (University of Maryland) | Kübler, Sandra (Indiana University)
This paper investigates incremental part of speech tagging for speech transcripts that contain multilin- gual intrasentential code-mixing, and compares the accuracy of a monolithic tagging model trained on a heterogeneous-language dataset to a model that switches between two homogeneous-language tagging models dynamically using word-by-word language identification. We find that the dynamic model, even though presented a smaller context consisting of sen- tence fragments, meets the accuracy of the monolithic code-mixing model which is aware of increased context. Our system is modular, and is designed to be expanded to many-language code-mixing.